Rank | Count | Beginning |
---|---|---|
106158 | 5448 | הוא |
28177 | 4961 | אני |
162423 | 3767 | זה |
87818 | 3266 | גם |
19537 | 2897 | אם |
873 | 2841 | אבל |
202489 | 2729 | לא |
267204 | 2656 | על |
181594 | 2570 | יש |
218994 | 2505 | לפי |
24752 | 2269 | אנחנו |
116961 | 2191 | היא |
270347 | 2082 | עם |
28176 | 1964 | "אני |
39254 | 1782 | את |
192533 | 1775 | כל |
9531 | 1699 | אחרי |
125629 | 1641 | הם |
230719 | 1635 | מה |
263681 | 1491 | עוד |
204185 | 1390 | לאחר |
14022 | 1296 | אין |
195700 | 1270 | כמו |
190524 | 1233 | כך |
67822 | 1162 | בנוסף, |
7124 | 1150 | אז |
96898 | 1138 | האם |
24751 | 1071 | "אנחנו |
167634 | 1051 | זו |
228989 | 1043 | מדובר |
In the next four subsections show the most frequent sentence beginnings consisting of N words, N=1, 2, 3, 4. In this subsection we start with N=1.
The most frequent word-N-grams at the beginning of sentences give some insight into sentence composition.
Especially for N=1, we only need a small corpus to identify the most frequent sentence beginnings.
select substring_index(sentence, ' ', 1) as beg, count(*) as cnt from sentences group by substring_index(sentence, ' ', 1) order by cnt desc limit 50;
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.1 Most Frequent Sentence Endings I
4.3.1.2 Most Frequent Sentence Endings II
4.3.1.3 Most Frequent Sentence Endings III
4.3.1.4 Most Frequent Sentence Endings IV